The Physics of Text: Ontological Realism in Information Extraction

نویسندگان

  • Stuart J. Russell
  • Ole Torp Lassen
  • Justin Uang
  • Wei Wang
چکیده

We propose an approach to extracting information from text based on the hypothesis that text sometimes describes the world. The hypothesis is embodied in a generative probability model that describes (1) possible worlds and the facts they might contain, (2) how an author chooses facts to express, and (3) how those facts are expressed in text. Given text, information extraction is done by computing a posterior over the worlds that might have generated it. As a by-product, this unsupervised learning process discovers new relations and their textual expressions, extracts new facts, disambiguates instances of polysemous expressions, and resolves entity references. The probability model also explains and improves on Brin’s bootstrapping heuristic, which underlies many open information extraction systems. Preliminary results on a small corpus of New York Times text suggest that the approach is effective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Scientific Realism and High Energy Physics

The paper discusses major implications of high energy physics for the scientific realism debate. The first part analyses the ways in which aspects of the empirically well-confirmed standard model of particle physics are relevant for a reassessment of entity realism, ontological realism and structural realism. The second part looks at the implications of more far-reaching concepts like string th...

متن کامل

Observe the Split Between the Paths: from Persian Tadhkirah to magical realism: A discourse in the review of Mystical Realism by Mehrnaz Shirazi Adel

From Persian Tadhkirah To Magical Realism: A Discourse in The Review of Mystical Realism Mehrnaz Shirazi Adel /Ph.D. student of Persian Literature at the Institute of Humanities and Cultural Studies/ [email protected] Abstract Mystical Realism; A Comparison of Suffi Tadhkirah writing and Magical Realism with Emphasis on Marquez's Works by Mohammad Roodgar is in effect his doctoral thesi...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

The Absence of ‘Paucity’ & ‘Momentariness’: Two New Components of Magical Realism in Günter Grass's The Tin Drum

This article presents the question whether it is correct to classify Günter Grass’s The Tin Drum as a work of magical realism. A brief scrutiny of the elements of magical realism, particularly Authorial Reticence and concept of Hesitation indicates that contrary to the advertisement of certain sources and publishers, this novel in certain circumstances, contradicts and opposes these two indispe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016